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Aegis  Open  Architecture  (AOA) 


•  Aegis  Open  Architecture  goals 

-  Provided  opportunity  to  modernize  the  Aegis  Weapon  System 
design 

-  Re-architect  applications  to  improve  openness  and  modularity 

-  Employ  COTS  products  when  possible  and  practical  for  system 
services 

-  Unify  a  fault-tolerant  design  across  subsystems 
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Open  Architecture  System  Management  (OASM) 


•  OASM  Goals 

-  Improve  consistency  of  and  access  to  system  status  information 

-  Modernize  approach  to  managing  COTS  computing  equipment 

-  Modernize  the  approach  to  Application  Management 

•  Provide  framework  for  fault-tolerant  application  designs 

•  Employ  COTS  high-availability  software  if  possible  and  practical 

•  Adhere  to  industry  standards 
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OASM  Application  Management  Scope 

•  Configure  applications  &  computing  nodes  into  a  system 

-  Define  SW  hierarchy,  redundancy  models,  SW  assignment  to  nodes 

•  Launch,  monitor,  terminate  SW  components 

•  Recover  from  HW  &  SW  failures 

-  Form  &  monitor  computing  cluster  (from  configured  nodes) 

-  Assign/re-assign  active  &  standby  roles  for  SW  components 

-  Clean  up  after  erroneous  SW  component  terminations 
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OASM  Application  Management  Scope  (continued) 

•  Attempt  repair  (restart/reboot)  to  reinstate  SW  components  &  nodes 

•  Interface  to  management  clients 

-  Provide  SW  &  node  cluster  structure  &  status 

-  Provide  administrative  controls  (startup,  shutdown,  repair) 

•  Provide  High  Availability  (HA)  services  to  running  SW  components 

-  Component  cooperates  with  OASM  in  monitoring  its  own  health 

-  Component  receives  its  HA  state  (active  or  standby)  from  OASM 

-  Component  issues  &  subscribes  to  notifications  via  OASM 

•  Abnormal  &  state-change  events 

-  Component  can  obtain  any  subsystem’s  availability  state  from  OASM 
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Open  Standards  Analysis 


•  A  set  of  evaluation  criteria  was  established 

•  Each  standard  was  evaluated  in  terms  of  its  ability  to  meet  that 
criteria 

•  Example  criteria: 

-  Ability  to  logically  group  applications 

-  Ability  to  start/stop  applications  in  groups 

-  Allow  user  specification  of  application  dependencies 

-  Support  application  failovers 

•  A  Service  Availability  Forum  (SAF)  standard  was  ultimately  selected 
based  on  the  evaluation  criteria 
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Service  Availability  Forum  (SAF) 


•  The  SAF  is  a  consortium  that  promotes  open  standards  for  mission 
critical  systems 

•  The  SAF’s  goals  align  well  with  DoD  goals: 

-  Faster  time  to  market  for  applications 

-  Reduced  life-cycle  costs 

-  Simplified  introduction  of  “best  in  breed”  software  components 

-  “There  is  no  upside  to  downtime” 

•  SAF  spec  is  comprised  largely  of  two  specifications: 

-  Hardware  Platform  Interface  -  HPI 

-  Application  Interface  Standard  -  AIS 
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SAF  Application  Interface  Specifications 


AIS  Specifications 


Cluster  Membership  Service  Availability  Management  Framework 

Notification  Service  Event  Service 

Checkpoint  Service  Log  Service 

Information  Model  Management  Messaging  Service 

Naming  Service  Platform  Management  Service 

Security  Service  Software  Management  Framework 

Timer  Service  Lock  Service 


Visit  http://www.saforum.org  for  all  SAF  specifications  & 
tutorials 
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OASM  Application  Management  and  the  AIS 


•  OASM  App  Mgmt  is  based  on  the  Service  Availability  Forum  (SAF) 
Application  Interface  Specification  (AIS),  version  B.02.01,  January 
2005 

•  The  AIS  Availability  Management  Framework  (AMF) 

-  Specifies  a  high  availability  model,  redundancy  models,  auto¬ 
repair  behaviors 

-  Specifies  an  application  API  for  HA  management 

•  The  AIS  Notification  Service  was  also  implemented 
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Key  Concepts  in  AIS 


•  Computing  Cluster 

-  Physical  nodes  that  correspond  to  actual  machines 

•  Components 

-  Generally  one  or  more  software  processes  (as  in  Unix/Linux) 

-  Lowest  level  of  managed  entity  recognized  by  AMF 

•  Service  Units 

-  Composed  of  one  or  more  components 

-  Assigned  service  instances  (work  units)  by  the  AMF 

-  Can  be  configured  to  have  redundant  instances  for  fault-tolerance 

•  Service  Groups 

-  Collection  of  Service  Units  that  protect  service  instances  against  failures 

-  Characterized  by  a  SG  Redundancy  Model  (2N,  M+N,  etc.) 
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Example  Cluster 


Cluster  1 

Node  1 


Service  Unit  1 

Component  1 
Component  2 


Service  Group  1 


Service  Unit  2  Service  Group  2 

Component  3 


Cluster  1  contains: 


6  Components 
4  Service  Units 


2  Service  Groups 
2  Nodes 


Node  2 


Service  Unit  1 

Component  1 
Component  2 


Service  Unit  2 

Component  3 
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COTS  Product  Evaluations 


•  A  set  of  evaluation  criteria  was  established 

•  Products  were  initially  evaluated  with  test  applications  in  an 
unclassified  environment 

-  Evaluation  included  both  functionality  and  performance 

•  Examples  of  important  criteria: 

-  Extensibility  of  the  product 

-  Ability  to  set  thread  priorities 

-  Customizable  logging  levels 

-  Vendor’s  presence  in  the  marketplace 

•  A  demonstration  of  candidate  products  was  provided  for  the 
stakeholders 
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COTS  Product  Evaluation 


•  Ultimately  GoAhead’s  SelfReliant  product  was  chosen 

•  A  SAF  compliant  product  was  on  GoAhead’s  roadmap  but  their 
product  development  schedule  and  our  ship  delivery  schedules  did 
not  align 

•  In  reality  we  needed  the  product  before  it  was  commercially 
available 
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OASM  Development 


•  In  order  to  meet  our  schedule  a  custom  solution  was  developed 

•  An  OASM  value-add  layer  was  created  to  insulate  the  tactical 
applications  from  the  underlying  COTS  product 

•  This  layer  provided: 

-  SAF  APIs  to  the  tactical  components 

-  The  ability  to  add  instrumentation  within  the  API 

-  An  adaptive  layer  for  future  COTS  product  insertion 
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OASM  Development 


•  The  OASM  product  extended  the  functionality  of  the  basic  COTS 
product  in  the  following  areas: 

-  System  Configuration  files  used  to  create  the  System  Model  are 
validated  and  parsed  at  system  initialization 

-  Nodes  receive  requests  to  start/stop  applications  from  a  centralized 
manager 

-  Data  recording  of  system  events,  state  changes,  etc. 

-  System  model  data  is  published  via  SNMP  for  subsequent  use  by  a 
control  agent 
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Management-Client  Interface 


•  Library  of  C++  classes  providing  access  to  OASM  Management 
information 

•  Management  information  obtained  via  SNMP  according  to  SAF 
MIBs 

•  The  Management-Client  interface  provides  access  to  a  repository  of 
system  SAF  objects 

-  Finer  granularity  of  information 


•  This  interface  provides  the  ability  to  modify  an  object’s  state  within 
the  model  from  an  outside  source 
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OASM  Status  Tracking  Service 


•  A  System  Status  Tracking  capability  was  created  to  allow 
components  to  monitor  other  component’s  status 

•  This  service  composes  SAF  status  into  overall  Up/Down  for  tactical 
components 

-  Higher  level  of  granularity  of  information 

•  The  Status  Tracking  Service  is  not  a  SAF  capability,  but  a  derived 
capability  crafted  from  SAF  constructs 
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Notification  Service 


■  The  Notification  Service  is  used  by  a  component  to  send  a 
system  event  notification  to  interested  subscribers 

■  Notifications  have  built-in  semantics  that  further  define  the 
type  and  reason  for  the  notification.  Our  implementation 
utilized  two  SAF  notification  types: 

■  Alarms  -  examples: 

■  Fatal  exceptions 

■  Loss  of  critical  resource 

■  Threshold  limitations  exceeded 

■  State  Change  -  examples: 

■  Initialization  Complete 

■  Channel  Up 

■  Operational  Mode  (Tactical,  Training) 
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Notification  Service 


•  Our  implementation  utilized  DDS  as  the  message  transport 

•  Notifications  are  data  recorded  to  aid  in  root/causal  analysis 

•  Common  classes  of  notifications  were  provided  for  users 

•  Reader  utility  classes  are  provided  to  allow  for  search  and  retrieval 
of  notification  data 
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Implementing  the  Product 


•  An  OASM  Integrated  Product  Team  was  established  that  was 
comprised  of  a  design  team,  development  team  and  integration 
team 

•  Weekly  meetings  with  the  tactical  product  areas  were  held  to 
discuss  requirements  and  the  SAF  adoption  process 

•  The  design  team  worked  to  ensure  that  the  OASM  requirements 
supported  the  product  area  requirements 

•  The  development  team  created  a  product  that  provided  the 
necessary  abstractions  between  the  SAF  API  and  the  SelfReliant 
product 

•  The  integration  team  focused  on  lab  activities  and  working  with  each 
product  area  to  ensure  successful  integration  of  the  product 
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Integrating  OASM  into  the  Tactical  Product  Areas 


•  Integrating  OASM  into  the  individual  product  areas  was  a 
challenging  task 

•  Each  product  area  contained  a  unique  resource  management 
implementation 

•  Some  product  areas  were  already  componentized  and  easily 
adopted  the  new  resource  management  scheme 

•  There  were  also  several  legacy  applications  that  required  some 
significant  rework  to  integrate  with  OASM 
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Integrating  Open  Architecture  Components 


•  Tactical  applications  that  were  designed  with  an  open  architecture 
philosophy  were  easily  integrated 

•  Component  based  architectures  fit  well  within  the  SAF 

•  Component  dependencies  were  easily  established 

•  Recovery  designs  were  crafted  from  SAF  capabilities 
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Integrating  Legacy  Components 


•  A  subset  of  tactical  applications  were  leveraged  from  prior 
development  efforts 

•  These  applications  were  based  on  legacy  designs  that  contained 
proprietary  resource  management  solutions 

•  Each  application  was  modified  to  accept  the  new  service 

-  Remove  legacy  system  management  solution 

-  Adding  the  new  APIs 

•  In  most  cases  the  tactical  applications  had  implemented  a  service 
layer  for  their  resource  management  capability 

•  We  found  that  legacy  recovery  requirements  fit  well  within  the  SAF 
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Integrating  Other  Applications 


•  Some  components  could  not  accept  the  OASM  API 

-  COTS  products  with  no  access  to  source  code 

-  Legacy  products  used  on  other  projects  that  did  not  have  OASM  as  the 
resource  management  service 

•  A  custom  “wrapper”  was  developed  to  address  these  components 
and  serves  as  a  proxy 

-  OASM  launches  and  monitors  the  wrapper 

-  Executable  is  launched  via  the  wrapper 

-  The  wrapper  executes  the  OASM  API  on  behalf  of  the  application 
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Deploying  the  Product 


•  The  OASM  product  was  successfully  deployed  on  the  USS  Bunker 
Hill  CG-52  as  part  of  the  Navy’s  Cruiser  Modernization  Effort 

•  OASM  will  be  deployed  as  a  part  of  the  Aegis  Modernization  efforts 
on  destroyers  and  cruisers 
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System  Level  Integration 


•  Integrating  multiple  product  areas  with  OASM  went  fairly  well 

-  Minor  network  configuration  changes  needed 

•  Our  system  model  became  large  and  complex 

-  Ultimately  we  revisited  our  node  cluster  organization 

•  OASM  logs  become  critical  for  a  first  level  assessment  of  multi- 
component  failures 

•  Data  recording  of  key  events  provides  critical  information  for 
performance  analysis  and  root  cause  analysis  for  failures 
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Summary 


•  Commercial  standards  may  not  fulfill  all  of  your  requirements 

-  Standards  can  be  augmented  with  “value  add”  services 

•  Look  for  COTS  solutions  that  are  extensible  and  mature 

•  Find  vendors  who  are  willing  to  modify  their  product  when  needed 

•  Evaluate  the  product  in  terms  of  its  functional  capabilities  and 
performance  to  gain  confidence  in  it 

•  Recognize  that  some  components  will  have  unique  requirements 
and  need  to  be  managed  differently 

•  Encourage  design  teams  to  meet  often  and  openly  discuss 
requirements 

•  Stand  up  an  integration  team  and  embed  them  with  product 
developers 
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Acronyms 


•  AIS  -  Application  Interface  Specification 

•  AMF- Availability  Management  Framework 

•  AOA  -  Aegis  Open  Architecture 

•  COTS  -  Commercial  off  the  Shelf 

•  DDS  -  Data  Distribution  Service 

•  HA  -  High  Availability 

•  OASM  -  Open  Architecture  System  Management 

•  SAF-  Service  Availability  Forum 

•  SNMP  -  Simple  Network  Management  Protocol 
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